Woylier: Alternative tour frame interpolation method

The woylier package implements alternative method for interpolation path between tour frames using Givens rotation.

Zoljargal Batsaikhan https://www.britannica.com/animal/quokka (Monash University) , Dianne Cook https://www.britannica.com/animal/bilby (Monash University) , Ursula Laa https://www.britannica.com/animal/bilby (BOKU University)
2022-10-26

1 Introduction

When data has up to 3 variables, visualization is relatively intuitive, while with more than three variables we face a challenge of visualizing high-dimensions onto 2D screens. This issue was tackled by grand tour(Asimov 1985) which can be used to view data in more than three dimensions using linear projections. It is based on idea of rotations of a lower dimensional projection in high-dimensional space. Grand tour allows users to see dynamic 2D projections of higher dimensional space. Originally, Asimov’s grand tour presents the viewer with an automatic movie of projections with no user control. Since then lots of work has been done about interactivity of the tour giving the control to users(Buja et al. 2005). The enhancement of grand tour includes manual tour, little tour, guided tour, local tour, and planned tour.

The guided tour gives user an exploration power by combining projection pursuit with tour which is implemented in the “tourr”(Wickham et al. 2011) package. The idea of projection pursuit is a procedure used to locate the projection of high-to-low dimensional space that expose the most interesting feature of data originally proposed by (Kruskal 1969). It involves defining a criterion of interest, numerical objective function, that indicates the interestingness of each projection and selecting planes with increasing value of the function. In the literature, there are a number of such criterion has been developed based on clustering, spread, and outliers.

Current implementation of guided tour in “tourr”(Wickham et al. 2011) package uses geodesic interpolation between planes. The interpolated paths based on geodesic interpolation between bases is visually invariant under changes of orientation. However, in some cases of non-linear projection pursuit the orientation of frames does matter. One example is splines2D index.(Laa and Cook 2020) The lack of rotation invariance of splines2d index raise complications in the optimization process. This rotational variety issue of non-linear projection pursuit functions was the motivation of this work.

Two side by side scatterplots with 6 points. The plot on the right hand side is 30 degree rotation of the left hand side. The calculated splines index is shown on top of each plots. Although they depicts the same points we can see that splines index is different which show the rotational variance of splines index.

Figure 1: The plot on the right hand side is 30 degree rotation of the left hand side. The calculated splines index is shown on top of each plots. Although they depicts the same points we can see that splines index is different which show the rotational variance of splines index.

A few alternatives to geodesic interpolation were proposed by (Buja et al. 2005) including decomposition of orthogonal matrices, Givens decomposition and Householder decomposition. The purpose of woylier package is to implement Givens paths method in R. This algorithms adapts Given’s matrix decomposition technique which allows the interpolation to be between frames rather than planes.

This article is structured as follows. The next section provided the theoretical framework of Givens interpolation method followed by a section about the implementation Givens path in woylier package that is compatible with current geodesic_path() function of tourr package. Furthermore, we will apply this interpolation method to projection pursuit of splines index to search for nonlinear associations between variables in example data set. Finally, this article includes a discussion about the further steps.

2 Background

The tour method of visualization is animated high-to-low dimensional data rotation that is a movie, one-parameter (time) family of static projections. Several algorithms for such dynamic projections are discussed in (Buja et al. 2005) is based on the idea of smoothly interpolating a discrete sequence of projections.

The topic of this article is the construction of paths of projections. Interpolation of paths of projection can be compared to connecting line segments that interpolate points in Euclidean space. Interpolation acts as a bridge between continuous animation and discrete choice of sequences of projections. Sequence of projections can be constructed in various ways depending on user purpose. If user wants to look at the data from all sides, a random sequence of projections can be used, which is implemented in grand tour(Wickham et al. 2011). Furthermore, the sequence of projections can be pre-computed, data-driven, or even manually controlled.

The interpolating paths of plane versus frames

Current implementation of tourr package(Wickham et al. 2011) is locally shortest (geodesic) interpolation of planes. The pitfall of this interpolation method is that it does not account for rotation variability. Therefor, the interpolation of frames is required when the orientation of projection matters. If the rendering on a frame and on the rotated version of the frame yields the same visual scenes, it means the orientation does not matter.

The orientation of frames could be important when non-linear projection pursuit function is used in guided tour. An illustration of such cases are shown below.

Plane to plane interpolation (left) and Frame to frame interpolation (right)Plane to plane interpolation (left) and Frame to frame interpolation (right)

Figure 2: Plane to plane interpolation (left) and Frame to frame interpolation (right)

Before continuing with the interpolation algorithms, we need to define the notations.

\[F^TF = I_d\]

Preprojection algorithm

In order to make the interpolation algorithm simple, we need to carry out “preprojection” step. The purpose of preprojection is to limit data subspace that the interpolation path, \(F(t)\), is traversing. In other words, preprojection step make sure the interpolation path between two frames \(F_a\) and \(F_z\) is not going to the data space that is not related to \(F_a\) and \(F_z\). Simply, prepojection algorithm is defining the joint subspace of \(F_a\) and \(F_z\).

The procedure starts with forming an orthonormal basis by applying Gram-Schmidt to \(F_z\) with regard to \(F_a\). We denote this orthonormal basis by \(F_\star\). The build preprojection basis \(B\) by combining \(F_a\) and \(F_\star\) as follows.

\[B = (F_a, F_{\star})\]

The dimension of the resulting orthonormal basis, \(B\), is \(p\times 2d\).

Then, we can express the original frames in terms of this basis:

\[F_a = B^TW_a, F_z = B^TW_z\]

The interpolation problem is then reduced to the construction of paths of frames \(W(t)\) that interpolate the projected frames \(W_a\) and \(W_z\). Because \(B\) is orthonormalized basis of \(F_z\) with regard to \(F_a\), \(W_a\) is \(2d\times d\) matrix of 1, 0s. This is an important character for our interpolation algorithm of choice, Givens interpolation.

Givens interpolation path algorithm

A rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space in xy plane. A rotation matrix that transforms 2-D plane by an angle \(\theta\) looks like this:

\[ \begin{bmatrix}\cos \theta &-\sin \theta \\\sin \theta &\cos \theta \end{bmatrix} \]

If the rotation is in the plane of selected 2 variables, it is called Givens rotation. Let’s denote those 2 variables \(i\) and \(j\). The Givens rotation is useful for introducing zeros on a grand scale and used for computing the QR decomposition of matrix in linear algebra problems. One advantage over other transformation methods which is particularly useful in our case is the ability to zero elements more selectively.

The interpolation methods in the woylier package is based on the fact that in any vector of a matrix, one can zero out the \(i\)-th coordinate with a Givens rotation in the \((i, j)\)-plane for any \(j\neq i\). This rotation affects only coordinate \(i\) and \(j\) and leave all other coordinates unchanged.

Sequences of Givens rotations can map any orthonormal d-frame F in p-space to standard d-frame \(E_d=((1, 0, 0, ...)^T, (0, 1, 0, ...)^T, ...)\).

The interpolation path construction algorithm from starting frame \(F_a\) to target frame \(F_z\) is illustrated below. The example is 2D path construction process of original 6D data frame.

  1. Construct preprojection basis \(B\) by orthonormalizing \(F_z\) with regards tp \(F_a\) with Gram-Schmidt.

In our example, \(F_a\) and \(F_z\) are \(p\times d\) or \(6\times2\) matrices that are orthonormal. The preprojection basis \(B\) is \(p\times 2d\) matrix that is \(6\times 4\).

  1. Get the preprojected frames using the preprojection basis \(B\). \[W_a = B^TF_a = E_d\] and \[W_z = B^TF_z\]

In our example, \(W_a\) looks like:

\[ \begin{bmatrix}1 & 0 \\0 &1 \\ 0&0 \\0&0\end{bmatrix} \]

\(W_z\) is orthonormal \(2d\times d\) matrix that looks like:

\[ \begin{bmatrix} a_{11} & a_{12} \\a_{21} &a_{22} \\ a_{31}&a_{32} \\a_{41}&a_{42}\end{bmatrix} \]

  1. Then, we can construct a sequence of Givens rotations that maps \(W_z\) to \(W_a\):

\[ W_a = R_m(\theta_m) ... R_2(\theta_2)R_1(\theta_1)W_z\]

At each rotation, the angle \(\theta_i\) that zero out the second coordinate of a plane is calculated.

When \(d = 2\), there are 5 rotations involved with 5 different angles that makes each elements 0. For example, the first rotation angle \(\theta_1\) is an angle between \((1, 0)\) and \((a_{11}, a_{21})\). This rotation matrix would make element \(a_{21}\) zero:

\[R_1(\theta_1) = G(1, 2, \theta_1) = \begin{bmatrix} cos\theta_1 & -sin\theta_1 & 0 & 0 \\sin\theta_1 &cos\theta_1 & 0 &0 \\ 0&0&1&0 \\0&0&0&1\end{bmatrix}\]

6th rotation is not necessary due to orthonormality of columns. If we make one element of a column 1 that means all other elements must 0.

  1. The inverse mapping is obtained by reversing the sequence of rotations with the negative of the angles, we starts from the starting basis and end at the target basis.

\[R(\theta) = R_1(-\theta_1) ... R_m(-\theta_m), \ W_z = R(\theta)W_a\] This step should include the time parameter, \(t\), so it shows the interpolation process rendered in the movie-like sequence.

  1. Finally, we reconstruct our original frame using \(B\). This reconstruction is done at each step of interpolation so we have interpolated path as result.

\[F_t = B * W_t\]

Projection pursuit index functions

The properties of several projection pursuit index functions were investigated.(Laa and Cook 2020) The smoothness, squintability, flexibility, rotation invariance, and speed of projection pursuit index functions were examined. The one property that is interesting to us is rotation invariance. The rotational invariance is examined by computing projection pursuit index fro different rotations within 2D plane. It is established that the dcor2d, splines2d and TIC index are not rotationally invariant. Splines2D index measures nonlinear association between variable by fitting spline model. It compares the variance of residuals and the functional dependence is stronger if the index value is larger.

Sphere and torus

In order to illustrate the interpolated path of projections we used geozoo(Schloerke 2016) package. 1D projection is plotted on unit sphere, while 2D projection is visualized on torus. The points on the surface of sphere and torus shape are randomly generated by functions from the geozoo(Schloerke 2016) package.

3 Implementation

We implemented each steps mentioned in Givens interpolation path algorithm in separate functions and combined them in givens_full_path() function. Here is the input and output of each functions and it’s descriptions.

Function Input Output Description
preprojection() Starting and target frame (Fa, Fz) B pre-projection px2d matrix Build a d-dimensional pre-projection space by orthonormalizing Fz with regard to Fa
construct_preframe() A frame and the pre-projection px2d matrix Preprojected frame in preprojection space Construct preprojected frames
row_rot() Starting and target frame (Fa, Fz) B pre-projection px2d matrix Build a d-dimensional pre-projection space by orthonormalizing Fz with regard to Fa
calculate_angles() Preprojected frames (Wa, Wz) Names list of angles Calculate angles of required rotations to map Wz to Wa
givens_rotation() Wa starting preprojected frame, list of angles of required rotations to map Wz to Wa, stepfraction Givens path It implements series of Givens rotations that maps Wa to Wz
construct_moving_frame() Pre-projection matrix B, Each frame of givens path A frame of on a step of interpolation Reconstruct interpolated frames using pre-projection
givens_full_path() Starting and target frame (Fa, Fz) and number of steps An array with nsteps matrix. Each matrix is interpolated frame in between starting and target frames. Construct full interpolated frames.

The interface of tour is that it renders one projection of data at a time. It displays one projection and asks for the next projection. Therefore, path of projections shown below is sequence of projections to be renders at tour display.

The givens_full_path() function returns the intermediate interpolation step projections in given number of steps. The code chunk below demonstrates, the interpolation between 2 random basis in 5 steps.

            [,1]
[1,]  0.24406482
[2,] -0.31814139
[3,] -0.24334450
[4,] -0.39166263
[5,] -0.08975114
[6,] -0.78647758
           [,1]
[1,] -0.6256883
[2,]  0.1641842
[3,]  0.4427114
[4,]  0.1426996
[5,]  0.5943406
[6,] -0.1093633
, , 1

            [,1]
[1,]  0.01086858
[2,] -0.27240617
[3,] -0.08264232
[4,] -0.35891689
[5,]  0.14040479
[6,] -0.87767429

, , 2

            [,1]
[1,] -0.22389989
[2,] -0.18726515
[3,]  0.09001476
[4,] -0.27425086
[5,]  0.35025000
[6,] -0.84190816

, , 3

            [,1]
[1,] -0.42627939
[2,] -0.07503468
[3,]  0.24965046
[4,] -0.14991218
[5,]  0.50942866
[6,] -0.68435306

, , 4

             [,1]
[1,] -0.566994057
[2,]  0.048050183
[3,]  0.373172156
[4,] -0.003887462
[5,]  0.594914254
[6,] -0.427800630

, , 5

           [,1]
[1,] -0.6256883
[2,]  0.1641842
[3,]  0.4427114
[4,]  0.1426996
[5,]  0.5943406
[6,] -0.1093633

4 Examples of interpolated paths

In this section we illustrate the use of givens_full_path() function by plotting the interpolated path between 2 frames.

Interpolated paths of 1D projection

1D projection of data in high dimension is a point. Therefore we can plot the point on the surface of a sphere. The following plot shows the Givens interpolation steps between 2 points, 1D projection of 6D data that is.

2 highlighted points on the surface of sphere connected by 10 interpolated steps, rotating.

Figure 3: Interpolation steps of 1D projections of 6D data

Interpolated paths of 2D projection

In case of 2D projections, we can plot the interpolated path between 2 frames on the surface of torus.

2 highlighted points on the surface of torus connected by 10 interpolated steps, rotating.

Figure 4: Interpolation steps of 2D projections of 6D data

5 Comparison of geodesic interpolation and Givens interpolation

Geodesic interpolation path (left) and Givens interpolation path (right) to target frameGeodesic interpolation path (left) and Givens interpolation path (right) to target frame

Figure 5: Geodesic interpolation path (left) and Givens interpolation path (right) to target frame

6 Conclusion

The R package woylier provides implementation of Givens interpolation path algorithm that can be used as an alternative interpolation method for tour. The algorithm implemented in the woylier package comes from (Buja et al. 2005). We illustrate the use of the functions provided in the package for R users.

The motivation to develop this package comes from rotational invariance problem of current geodesic interpolation algorithm implemeneted in tourr package(Wickham et al. 2011). The package gives users the ability to detect non-linear association between variables more precisely.

It is important to mention that woylier package should be integrated with tourr package(Wickham et al. 2011). The future improvements that needs to be done in the package is to generalize the interpolation for more than 2d projections of data.

D. Asimov. The grand tour: A tool for viewing multidimensional data. SIAM J Sci Stat Comput 6(1), 128–143, 1985.
A. Buja, D. Cook, D. Asimov and C. Hurley. Computational methods for high-dimensional rotations in data visualization. Handbook of Statistics, 391–413, 2005. DOI 10.1016/s0169-7161(04)24014-7.
JB. Kruskal. Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new “index of condensation. Statistical computation; New York, Academic Press, 427–440, 1969.
U. Laa and D. Cook. Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics. Comput Stat 35, 1171–1205, 2020. DOI https://doi.org/10.1007/s00180-020-00954-8.
B. Schloerke. Geozoo: Zoo of geometric objects. 2016. URL https://CRAN.R-project.org/package=geozoo. R package version 0.5.1.
H. Wickham, D. Cook, H. Hofmann and A. Buja. tourr: An R package for exploring multivariate data with projections. Journal of Statistical Software, 40(2): 1–18, 2011. URL https://doi.org/10.18637/jss.v040.i02.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Batsaikhan, et al., "Woylier: Alternative tour frame interpolation method", The R Journal, 2022

BibTeX citation

@article{woylier_article,
  author = {Batsaikhan, Zoljargal and Cook, Dianne and Laa, Ursula},
  title = {Woylier: Alternative tour frame interpolation method},
  journal = {The R Journal},
  year = {2022},
  note = {https://doi.org/10.32614/woylier_article},
  doi = {10.32614/woylier_article},
  issn = {2073-4859},
  pages = {1}
}